News Authorship Identification with Deep Learning

نویسندگان

  • Liuyu Zhou
  • Huafei Wang
چکیده

Authorship identification identifies the most possible author from a group of candidate authors for academic articles, news, emails and forum messages. It can be applied to find the original author of an uncited article, to detect plagiarism and to classify spam / nonspam messages. In this project, we tackled this classification task in author level, article level, sentence level and word level with various deep and non-deep classification algorithms and GloVe word vectors are used as the pre-trained word vectors. Among all the algorithms, sentence-level Recurrent Neural Network (RNN) achieves the best performance since it captures the context information as well as word / sentence sequence information from the training dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Binary Code Multi-Author Identification in Multi-Toolchain Scenarios

Knowing the authors of a binary program has significant application to forensic analysis of malicious software (malware), software supply chain risk management, and software plagiarism detection. As different compilation toolchains may generate drastically different binary code for the same source code, it is essential to be able to reliably identify authors across multiple toolchains. However,...

متن کامل

Linguistic correlates of style: authorship classification with deep linguistic analysis features

The identification of authorship falls into the category of style classification, an interesting sub-field of text categorization that deals with properties of the form of linguistic expression as opposed to the content of a text. Various feature sets and classification methods have been proposed in the literature, geared towards abstracting away from the content of a text, and focusing on its ...

متن کامل

On the Benefit of Combining Neural, Statistical and External Features for Fake News Identification

Identifying the veracity of a news article is an interesting problem while automating this process can be a challenging task. Detection of a news article as fake is still an open question as it is contingent on many factors which the current state-of-the-art models fail to incorporate. In this paper, we explore a subtask to fake news identification, and that is stance detection. Given a news ar...

متن کامل

Authorship Identification in Large Email Collections: Experiments Using Features that Belong to Different Linguistic Levels - Notebook for PAN at CLEF 2011

The aim of this paper is to explore the usefulness of using features from different linguistic levels to email authorship identification. Using various email datasets provided by PAN’11 lab we tested several feature groups in both authorship attribution and authorship verification subtasks. The selected feature groups combined with Regularized Logistic Regression and One-Class SVMmachine learni...

متن کامل

A Framework for Authorship Identification in the Internet Environment

Misuse of anonymous online communication for illegal purposes has become a major concern [2,12]. In this paper, we present a framework named ART (Authorship Recognition Tool), that is designed to minimize manual procedures and maximize the efficiency of authorship identification based on the content of Internet electronic documents. The framework covers the phases of document retrieval and data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016